Seo guru Page 2

Image
  Meta Description, Content and Image Optimization  Meta Description Attribute Meta descriptions are HTML attributes that provide concise summaries of webpages. They are between one sentence to a short paragraph and appear underneath the blue clickable links in a search engine results page (SERP). However, depending on a user's query, Google might pull meta description text from other areas on your page (in an attempt to better answer the searcher's query). Example: Meta description example Code sample Optimal length Meta descriptions can be any length, but Google generally truncates snippets ~300 characters (this limit increased in December 2017). It's best to keep meta descriptions long enough that they're sufficiently descriptive, so we recommend descriptions between 50–300 characters. Keep in mind that the "optimal" length will vary depending on the situation, and your primary goal should be to provide value and drive clicks. <head> <meta name=...

Robots.txt – The complete checklist for Blogger, WordPress & other

 Robots.txt – The complete checklist for Blogger, WordPress & other 


In seo, everybody talks about keywords, backlinks, and content material nice.but there’s one tiny report that often goes not noted—but it could make or break your website’s seek overall performance.that record is robots.txt its necessary for seo setting.
Think of robots.txt as a set of visitor rules for search engines like Google.It tells Google and other crawlers which parts of your website they’re allowed to explore and which ones to avoid.Done right, it can improve crawling efficiency, conserve crawl budget, and keep inappropriate or duplicate content out of search results.Done wrong, it can block your site from ranking altogether.
In this guide, we’ll walk through a complete robots.txt checklist with detailed explanations, common mistakes, and platform-specific examples for Blogger, WordPress, Shopify, Wix, and more.
via the end, you’ll know exactly how to configure robots.txt the right way—whether you’re running a personal blog or a massive eCommerce site.
A digital illustration of a robot holding a robots.txt file, symbolizing website crawling and SEO control.

1) Robots.txt fundamentals (syntax & placement)

  •  File location: must live at the site root (e.g., https://lnkd.in/robots.txt). Putting it in a subfolder (e.g., /userpages/yourname/robots.txt) does not work.

  •  Scope: a robots.txt file applies only to the host/subdomain it’s served from. If you have multiple subdomains, each needs its own robots.txt.
  •   Blocks crawling, not indexing: disallowed URLs might still be indexed if discovered via external links; use noindex (meta or HTTP header) for true de-indexing.


  Minimum structure: every ruleset must begin with a User-agent: line, or nothing is enforced. Paths in Allow/Disallow must start with /.

Minimal skeleton

User-agent: *
Disallow:

(Empty Disallow means “crawl everything.”)

2) Matching rules you’ll actually use 


Robots.txt supports limited wildcards:

  * = matches zero or more characters
  $ = end of URL (end-anchor)

Common, safe patterns

 Block entire site (staging only!)
User-agent: *
Disallow: /

(Useful for staging; be sure to remove on production.)

 Block specific folders
User-agent: *
Disallow: /calendar/
Disallow: /junk/
Disallow: /books/fiction/contemporary/

(Append / to block a whole directory.)

  Allow only Google News; block everyone else
User-agent: Googlebot-news
Allow: /

User-agent: *
Disallow: /

  Allow all but one bot
User-agent: Unnecessarybot
Disallow: /

User-agent: *
Allow: /

  Block single pages
User-agent: *
Disallow: /useless_file.html
Disallow: /junk/other_useless_file.html

  Block entire site but allow one public section
User-agent: *
Disallow: /
Allow: /public/

  Block all images from Google Images
User-agent: Googlebot-Image
Disallow: /

(Or block a single image: Disallow: /images/dogs.jpg.)

  Block a file type everywhere (end-anchor)
User-agent: Googlebot
Disallow: /*.gif$

($ ensures you only block URLs ending with .gif.)

  Block URL patterns with query strings
User-agent: *
Disallow: /*?

(Blocks any URL containing ?.)

  Block PHP pages (anywhere in path) vs only those ending with .php
User-agent: *
Disallow: /*.php # contains
Disallow: /*.php$ # ends with .php

(Understand the difference between /*.php and /*.php$.)

3) Fine-grained control with Allow + Disallow


When rules conflict, Google chooses the most specific path (longest matching string). If lengths tie, Allow wins.

  Classic session-ID pattern
Block duplicates with ?, but allow a canonical “?-only” version (ends with ?):
User-agent: *
Allow: /*?$
Disallow: /*?

(Blocks any URL that contains ?, but allows URLs that end with ?.)

  Unblock a “good” page inside a blocked folder
User-agent: *
Allow: /baddir/goodpage
Disallow: /baddir/

(The longer Allow path beats the shorter Disallow.)

  Be careful with overlapping patterns
User-agent: *
Allow: /some
Disallow: /*page

/somepage is blocked (the /*page path is longer).
 


4) Order of precedence & conflict resolution (how crawlers decide)

  Most specific path wins (longest match).
  If equally specific, Google uses the least restrictive rule (i.e., Allow).

5) High-risk mistakes to avoid


  Leaving “Disallow: /” on production after launch.

Keep staging behind password (e.g., HTTP auth) so you can ship the same robots.txt to prod safely.

  Trying to block hostile scrapers via robots.txt

Bad actors ignore robots.txt. Use firewalls, IP/user-agent blocking, rate limiting, or bot management.

  Listing secret directories in robots.txt

This advertises where your private content lives. Use authentication. Band-aids: noindex meta or X-Robots-Tag (but still not a substitute for security).

  Accidental over-blocking with broad prefixes

Disallow: /admin also blocks /administer-medication….

Safer pair:
Disallow: /admin$
Disallow: /admin/

($ blocks exactly /admin, while the second blocks the folder.)

  Forgetting the User-agent line

Rules won’t apply without it. Also avoid mixing a general block with a later specific bot rule unless you repeat shared rules under each bot block.

  Case sensitivity

Paths are case sensitive. To block all variants, list each case explicitly.

  Trying to control other subdomains from one robots.txt

Each subdomain needs its own robots.txt at its own root.

  Using robots.txt as “noindex”

Disallow ≠ De-index. Use meta noindex/X-Robots-Tag for reliable removal from search. The PDF explicitly clarifies this (including a Bengali note explaining that Google may index a disallowed URL if discovered elsewhere).

6) Debugging & auditing checklist

  Confirm location: https://{host}/robots.txt loads and is publicly readable.
Validate syntax: every group begins with User-agent. All paths start with /.

  Scan for risky patterns: overly broad prefixes (e.g., /adm) that might catch unrelated pages (e.g., /administer…). Use $ and explicit folder slashes.

 Check wildcards: remember * is greedy; $ pins “end of URL.” Avoid trailing * after bare paths because /fish and /fish* behave the same.

 Conflict resolution: if a URL matches multiple rules, the longest path wins; if tie, Allow wins. Test suspect URLs against your rule set.

  Don’t rely on robots.txt to hide content: for sensitive assets, use authentication; for de-indexing, use noindex (meta or HTTP header).


Robots.txt Examples for Popular Platforms

Now let’s go platform by platform:


 Robots.txt for WordPress

Most WordPress sites generate a default robots.txt file. But it often needs customization.

Recommended WordPress Robots.txt

User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php Allow: /wp-admin/admin-ajax.php Sitemap: https://example.com/sitemap.xml

WordPress Best Practices

  • Don’t block /wp-content/ (contains CSS/JS needed for rendering).

  • Use SEO plugins like Yoast SEO or Rank Math to edit robots.txt directly.

  • Always include your sitemap.


 Robots.txt for Blogger

Blogger provides an option for Custom Robots.txt under Settings → Crawlers & Indexing.

Example Blogger Robots.txt

User-agent: * Disallow: /search Allow: / Sitemap: https://yourblog.blogspot.com/sitemap.xml

Why Block /search?

Because Blogger automatically creates duplicate URLs like:
https://yourblog.blogspot.com/search/label/SEO

Blocking /search prevents wasted crawl budget and duplicate indexing.


 Robots.txt for Shopify

Shopify auto-generates robots.txt, but you can now edit it.

Example Shopify Robots.txt

User-agent: * Disallow: /cart Disallow: /checkout Disallow: /orders Disallow: /admin Sitemap: https://example.com/sitemap.xml

Best Practice

  • Block cart/checkout/order pages.

  • Keep product and category pages open.


 Robots.txt for Wix & Squarespace

  • Both generate robots.txt automatically.

  • You can edit in Site Settings.

  • Ensure duplicate pages, filter URLs, and backend areas are blocked.


 Robots.txt for Custom CMS

If you’re using a custom-built site:

  • Upload robots.txt manually via FTP or cPanel.

  • Example template:

User-agent: * Disallow: /admin/ Disallow: /login/ Disallow: /tmp/ Allow: / Sitemap: https://example.com/sitemap.xml

  Quick “Do / Don’t” recap


Do
✅ Put robots.txt in the root of every host/subdomain you control.
✅ Use * and $ deliberately for precise matching.
✅ Use paired rules for tricky params (?id= and &id=).
✅ Prefer meta/X-Robots-Tag noindex for removal from search results.

Don’t
✅ Put secrets in robots.txt (it advertises them).
✅ Expect bad crawlers to obey robots.txt.
✅ Forget User-agent or the leading / in paths.
✅ Try to control other subdomains from one robots.txt.

Finally, Test in robots.txt Validator and Testing Tool

Frequently Asked Questions (FAQs)

1. What is a robots.txt file?

Robots.txt is a simple text file placed in the root directory of a website. It gives instructions to search engine crawlers about which pages or sections of the site they can or cannot crawl.


2. Where should I place the robots.txt file?

The robots.txt file must be placed in the root directory of your domain. Example:
https://example.com/robots.txt
https://example.com/folder/robots.txt


3. Does robots.txt block a page from Google completely?

No. Robots.txt prevents crawling but not indexing. If a blocked page is linked from elsewhere, Google may still index its URL (without content). For full control, use the noindex meta tag or HTTP headers.


4. Can robots.txt hide sensitive data?

No. Robots.txt is public, so anyone can view it. To protect sensitive information (like admin or customer data), use password protection, firewalls, or server-side restrictions.


5. What happens if I block CSS and JavaScript in robots.txt?

Blocking CSS/JS prevents Google from rendering your site properly. This can harm rankings since Google evaluates the full user experience. Always allow CSS and JS files.


6. Is robots.txt necessary for every website?

Not always. Small websites with only a few pages can work fine without it. However, for blogs, eCommerce sites, or large platforms with many URLs, robots.txt is highly recommended to manage crawl budget efficiently.


7. How do I test my robots.txt file?

You can test it using Google Search Console’s robots.txt Tester. It allows you to check whether specific pages are being blocked or allowed for Googlebot.


8. What is the difference between robots.txt and meta robots tag?

  • Robots.txt → Controls crawling (which pages bots can visit).

  • Meta robots tag → Controls indexing (whether a page should appear in search results).
    Best practice is to use both together for maximum control.


9. Can I use robots.txt for specific search engines only?

Yes. You can set rules for specific crawlers by mentioning their user-agent. Example:

User-agent: Googlebot Disallow: /private/ User-agent: Bingbot Disallow: /test/

10. What are common mistakes in robots.txt?

  • Accidentally blocking the entire site with Disallow: /

  • Blocking CSS/JS files.

  • Using it to hide sensitive data.

  • Forgetting to add sitemap reference.

  • Having conflicting rules that confuse crawlers.

Comments

Popular posts from this blog

SEO Made Simple: A Beginner's Guide to Start SEO to Rank Your Website

What Is SEO? A Simple Guide for Small Business Owners

🔍 SEO Kya Hai? Basic SEO Practices Seekhne Ka Aasan Tarika